On Finding Optimal Policies for Markovian Decision Processes Using Simulation

نویسندگان

  • Apostolos N. Burnetas
  • Michael N. Katehakis
چکیده

A simulation method is developed, to find an optimal policy for the expected average reward of a Markovian Decision Process. It is shown that the method is consistent, in the sense that it produces solutions arbitrarily close to the optimal. Various types of estimation errors are examined, and bounds are developed.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Utilizing Generalized Learning Automata for Finding Optimal Policies in MMDPs

Multi agent Markov decision processes (MMDPs), as the generalization of Markov decision processes to the multi agent case, have long been used for modeling multi agent system and are used as a suitable framework for Multi agent Reinforcement Learning. In this paper, a generalized learning automata based algorithm for finding optimal policies in MMDP is proposed. In the proposed algorithm, MMDP ...

متن کامل

Non-Deterministic Policies In Markovian Processes

Markovian processes have long been used to model stochastic environments. Reinforcement learning has emerged as a framework to solve sequential planning and decision making problems in such environments. In recent years, attempts were made to apply methods from reinforcement learning to construct adaptive treatment strategies, where a sequence of individualized treatments is learned from clinic...

متن کامل

An Analysis of Direct Reinforcement Learning in Non-Markovian Domains

It is well known that for Markov decision processes, the policies stable under policy iteration and the standard reinforcement learning methods are exactly the optimal policies. In this paper, we investigate the conditions for policy stability in the more general situation when the Markov property cannot be assumed. We show that for a general class of non-Markov decision processes, if actual re...

متن کامل

Percentile Policies for Tracking of Markovian Random Processes with Asymmetric Cost and Observation

Motivated by wide-ranging applications such as video delivery over networks using Multiple Description Codes (MDP), congestion control, rate adaptation, spectrum sharing, provisioning of renewable energy, inventory management and retail, we study the state-tracking of a Markovian random process with a known transition matrix and a finite ordered state set. The decision-maker must select a state...

متن کامل

On-Line Search for Solving Markov Decision Processes via Heuristic Sampling

Abstract. In the past, Markov Decision Processes (MDPs) have become a standard for solving problems of sequential decision under uncertainty. The usual request in this framework is the computation of an optimal policy that defines the optimal action for every state of the system. For complex MDPs, exact computation of optimal policies is often untractable. Several approaches have been developed...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012